-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maskbits #608
Maskbits #608
Conversation
Ok, I've added the hooks to propagate e.g. from the config. script. If provided, it runs the maskbits lookup & realistic depths based on this dir. I think this is the minimum fix, but with the inheritance it's not clear if there was a better way - I figured I was mirroring the survey flag propagation so it must be ok. If not provided, it resorts to the simple model in there previously. I'm not sure how to handle warnings etc. if this directory is wrong or incomplete. For instance, there is already an expectation that some files will be missing as this occurs for failed legacy reductions of bricks. In this case, my code is removing the mock targets in these bricks. Maybe @geordie666 or others can make a suggestion to what's reasonable, given that this stage is just piggy backing on Currently, this branch just fails spectacularly if legacy_dir points to an empty dir. - as the remaining production is finding no targets to process. |
In the case that you fail spectacularly on a missing brick, could you inherit all of the information you can, then set every other column to zero and set At least then you could maintain all of the other mock-brick information, but correctly indicate downstream that this is really a "missing" brick. If |
Sorry, I was unclear in my initial post (then edited). Technically, it fails spectacularly if a requested healpixel contains no processed bricks. Otherwise, I've successfully been removing mock targets in unprocessed bricks with the code continuing gracefully. This is only a 'new' problem in the sense that legacy_dir, or requested healpixel, might be so entirely wrong that the spectacular fail occurs. I can easily initalize MASKBITS to 2*10 instead of removing the targets. This would at least allow the code to continue through part of the problem above. Arguably, I think I'd say the randoms should mirror the same thing, which isn't currently the case? This doesn't handle the root problem however, in that, we're assuming that when copying legacy_dir locally - perhaps trying to guess a subset of bricks necessary for a given set of healpixels - the user has been successful. If above, we'll just be flagging these targets as a missing brick, when it could in fact exist on nersc. My not so subtle point, is that I don't think the randoms are supporting local generation in this way. Otherwise, I think this would be easier. If we're assuming an entire copy of legacy_dir has been taken across then it's straightforward. I'm just not sure that is reasonable? Thanks! |
The randoms have both desitarget/py/desitarget/randoms.py Lines 401 to 406 in c0b31a2
but the randoms are still processed if the other imaging "stack" files are found. So, sort of a mixed approach depending on what information, if any, is available. I see your wider point, though. You might want to check which desitarget/py/desitarget/randoms.py Lines 1319 to 1324 in c0b31a2
and then take the set of brick-names that are included in a particular mock HEALPixel and only assign imaging information to bricks that have a brick-name in the The randoms do currently support "local" assignment with missing bricks in this manner. You could have a |
All of that is very useful, I hadn't appreciated a lot of it! But I think the bit missing for me is a file that contains what brick names should look like on nersc. This a file that we can require to exist locally that we can check against whether the file should exist locally when processing a pixel. Is there an easy way to go from a heal pixel to a set of brick names? |
Regarding what brick names should look like at NERSC: This code shows you the correct directory structure: desitarget/py/desitarget/randoms.py Lines 329 to 330 in c0b31a2
and, in theory, the survey bricks file should exist to tell you what brick names are in a given data release: desitarget/py/desitarget/skyfibers.py Lines 105 to 108 in c0b31a2
Regarding going from a HEALPixel to a set of brick names: I have code to do a bunch of possible bricks-in-a-Healpixel look-ups. But bricks aren't exactly bounded by great or small circles on the sphere, so you have to be careful and rigorous. Some options are:
If you let me know which one of these you're looking for, I can trawl through |
Bricks do have the nice property of being bounded by great circles (lines of longitude) and small circles (lines of latitude). Healpix doesn't have this property, which is what makes this hard. I wonder if Ted knows if this is a solved problem (overlaps of Healpix pixels with lines of longitude and latitude)? |
@djschlegel : If it's definitely the overlaps that @michaelJwilson wants (option 2) then yes, it's a solved problem. For example, using
It takes about 100s to construct the look-up dictionary, but then the reverse look-ups are very fast. For example, for which bricks "overlap" with HEALPixel
This uses https://healpy.readthedocs.io/en/latest/generated/healpy.query_polygon.html |
(actually, |
Adam, thanks. I've implemented the following changes: -- if legacy_dir is provided in the config, require that survey-bricks.fits.gz be there. Hopefully, that makes sense. The remaining "problem" is some simple means to scrape from nersc the necessary files for a given -- what is the nside scaling of this brick table construction? |
The brick table construction (and my snippet of code, above) will work at any For the "glob" you're discussing...I could set things up so that the list of necessary files is constructed first in the |
(or, I could make the |
Adam, at this point the requirement on this branch is to be able to run this on a laptop. The limiting factor to that is now how we facilitate some sort of 'get', given that it's not entirely trivial to know what files you need short of copying the entire legacy release. You'll see in the two commits directly above the relevant changes. In particular, I added a get_legacy.py script under mock that implements your brick table and attempts something like a transfer. I'm going to yield to you, Stephen & John on how this best should be done. I can happily copy across the necessary code as in get_legacy. I'd suggest that it'd be better not to just replicate entirely in this manner, but whatever. Feel free to hijack or tell me what you think is best. You quoted on nside of 8, but I believe the default in select_mock_targets is 64. Is that a linear hit in runtime on your 100s? |
Mike: why don't I start a new branch (tomorrow) that works on the "real data" side of things and does two things:
We'll then merge that branch once we agree that some test cases work for your needs. As you don't touch (much) outside of the |
Sounds perfect. This is basically the same problem as the local random
generation in terms of having the necessary files - assuming you cater for
productions in healpix.
Thanks!
On Tue, 28 Apr 2020 at 17:22, Adam Myers ***@***.***> wrote:
Mike: why don't I start a new branch (tomorrow) that works on the "real
data" side of things and does two things:
- formalizes my "bricks_in_healpixels" look-up code and puts it in
(the terribly named) desitarget.geomask module where all of my utility
functions live. You're correct: there is a hit in time for nside=64
(it's about 4x). On the other hand, it's embarrassingly parallel, so I can
probably get it down to under 30 seconds even at nside=64.
- creates a list=True/False option for the quantities_ function that
returns the necessary files to run the function.
We'll then merge that branch once we agree that some test cases work for
your needs. As you don't touch (much) outside of the mock directory in
this PR (so far!), you should then easily be able to merge master into your
branch and pick up my changes. Sound good?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#608 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABOBFERSWITXM5UZKQBGZPTRO5XK7ANCNFSM4MSE55QA>
.
--
…----------------------------------------
Dr. Michael Wilson
Cosmology Postdoctoral Fellow
Lawrence Berkeley National Lab
1 Cyclotron Road, M/S 50A5104
Berkeley, CA 94720, USA
Tel: (+01) 510-502-3448
(+44) 7742 373026
|
The subtle difference is that I cater for parallelizing by brick and then bundle bricks into jobs based on which HEALPixel a brick center lies in. This is different (I think) than how the mocks were set up, to actually parallelize directly across HEALPixels. So, if I don't have a brick, I just don't run it at all. No matter, we'll get there! |
@michaelJwilson I'm just starting to review this but it looks like your PR is based on a fork of I'm actually not sure how to proceed... |
That's right. I had started using branches on repo but got burned after some poor git practice and went back to forks. Should I just try and start the same branch on desitarget. It will probably screw over the PR but that's probably not important. |
I talked to Stephen. Recommended approach is to add you as a collaborator on the fork, which I've just done. I don't see any box for settings against pushing etc. so I think that's a default. Let me know if there's any problems with that. |
@geordie666 I ran the script using brick_names_touch_hp etc ... and "successfully" copied the files "required" for the maskbits generation for a given healpix to my scratch. But I'm generating a lot of I'm also having problems further downstreaentm with dr_extension in select_mock_targets when pointed to this directory, but still getting to the bottom of that. |
I have some code to look up the north/south bricks, but it does require that the basic directory structure is in place for a Legacy Surveys DR. For example:
Practically, for a laptop user, this would require you to
and make sure that they're in the same place in whatever you're setting up as
|
Ok, great. Thanks. I'm requiring survey-bricks to be present for maskbits runs with select-mock-targets in any case. I had handled this by cleaning up by removing any empty dirs afterwards. It's a bit crude, but given yours wasn't a one liner it might suffice. legacy.py under mock/ now sets up a legacy_dir to be pointed at by 'legacy_dir' in the config.yaml. I've successfully run this to create the directory & files, point at it and reduce healpixel 6195 with the legacy features enabled. I'm going to call that enough of a success for a week & leave it for @moustakas and others to get back to me on any changes. Thanks all, have a great weekend. |
I've finally had a chance to review the proposed changes--thank you for the work to date, @michaelJwilson. I think I still don't understand something fundamental about how you've approached the problem vs what I proposed on a recent telecon. We may need to chat offline to get on the same page, but briefly, I propose:
Someone working on a laptop will easily be able to do small-scale tests without having to copy over an entire data release (well, at least the
Is there a reason this approach wouldn't work? |
@@ -92,13 +92,13 @@ if rank == 0: | |||
keep = list() | |||
for i, pixnum in enumerate(pixels): | |||
truthspecfile = mockio.findfile('truth', args.nside, pixnum, | |||
basedir=args.output_dir) | |||
basedir=args.output_dir, obscon='dark') | |||
if not os.path.exists(truthspecfile): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely make sure that mock.io.findfile
matches how the "real" target catalogs are organized, but I thought we already had a dark
/bright
organization implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is, but I ran into errors due to a bug on this. I'd have to refresh my memory by comparing to master.
if not os.path.exists(truthfile): | ||
keep.append(i) | ||
|
||
log.info('{}/{} pixels remaining to do'.format(len(keep), len(healpixels))) | ||
log.info('{}/{} pixels remaining to do. {}'.format(len(keep), len(healpixels), healpixels[keep])) | ||
healpixels = healpixels[keep] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the intent of the third number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a lazy way for me know the healpixels to actually be processed, as opposed to the ones I requested, given some had already been generated and were on disk. I was playing games like submitting the second half of that list to a separate job to run in tandem. Obviously, there are more sophisticated approaches and it doesn't need to be kept.
params['legacy_dir'] = None | ||
|
||
log.info('Assuming simple homogeneous depths & no masking.') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can realistically expect people to copy over entire brick-sized coadd
files to their laptop when all we need are the mean properties at the location of a finite set of random/mock sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeh, it'd have been better had we been able to coordinate more at the start. Both I and @geordie666 didn't see this as particularly useful, but we understood that to be the thing that was compatible with "being able to run on a laptop". I.e. we tried to support also running/development the pre-burner on a laptop - by requiring only the minimal number of files given a list of heal pixels. Maybe you're willing to give up on the pre-burner on a laptop, this wasn't clear to me before. I certainly follow the logic of trying to separate the two steps and this probably wouldn't be hard.
My one "concern", which I think you're remembering from the telecon, is that I think you have in mind populating PSF depths etc. for something like the mocks that currently exist, e.g. ~20, including dark sky, etc, to get something desi-like. There is an additional use case of just the mask for ~1000s of (approximate) mocks in the clustering analysis, for which it's less clear that we want redundant information saved to disk in this case. I'd suggest it makes sense to develop select_mock_targets in such a way that it's a one stop shop for both exercises. I.e. in minimal mode, it turns a (ra, dec, z) mock into something accepted by downstream pipeline code, but which has flags etc. that let you dial up the sophistication of effects included. There's a discussion planned on this at the cosmosim group telecon on Thursday.
@@ -0,0 +1,66 @@ | |||
# Parameter file for desitarget/bin/select_mock_targets, including contaminants. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contaminants can just be commented out of the configuration file. That said, maybe it'd be worth adding a --no-contaminants
optional input to select_mock_targets
to overwrite whatever's in the config file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always find myself doing this so such a flag, or yaml, would be useful in my opinion.
@@ -0,0 +1,155 @@ | |||
20:35:22 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logs like this should not be checked into the repo. You'll need to scrub your commit history in order to remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still trying to break some engrained bad habits, hence I work on my fork to start. Happy to clean that up for the final version.
@@ -0,0 +1,5 @@ | |||
date +"%T" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shell script also doesn't belong in the desitarget
repo.
# Default MASKBITS for all mock targets to non-zero. If populating with realistic depths | ||
# these will later be overwritten. | ||
targets['MASKBITS'] = 2**10 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default MASKBITS
should not be BAILOUT
(which we're aiming to not even be needed for DR9); it should be initialized to zero with the appropriate datatype, and then replaced along with the other imaging quantities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry, you're correct obviously. This was meant to be a fix for when quantities_at ... returned None for a failed brick, or outside the footprint, that these targets would automatically be flagged as BAILOUT. This was @geordie666 suggestion and seemed neater than catching the None, etc cases. Clearly I messed it up and, to retain the previous functionality, I should have had MASKBITS default to zero unless a realistic imaging mode was requested, in which case default to something non-zero unless told otherwise?
@@ -340,7 +346,38 @@ def mw_dust_extinction(self, Rv=3.1): | |||
extinction = Rv * ext_odonnell(self.wave, Rv=Rv) | |||
return extinction | |||
|
|||
def imaging_depth(self, data): | |||
def set_wise_depth(self, data): | |||
import astropy.units as u |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#347 makes sense to me & it seems a natural demarcation to me.
bandkeep = ['NOBS', 'PSFDEPTH', 'GALDEPTH'] | ||
|
||
_ = empty_targets_table(nobj=1) | ||
dtypes = dict(zip(_.columns, [_[x].dtype for x in _.columns])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hate to be nitpicky but the offset/column-aligned equals signs (among other changes) is a significant change in style relative to the surrounding code. A good rule is to use the style of the existing code, while new modules can use your own personal style (although DESI code generally tries to conform to the PEP8 guidelines).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's clear there's going to be significant changes to this to get it into desihub and fair enough, this doesn't need to be one of them. This was my version and I find it helps me parse dense parts of the code.
depthkey = 'PSFDEPTH_{}'.format(band) | ||
|
||
sigma = 1 / np.sqrt(data[depthkey][indx]) / 5 # nanomaggies, 1-sigma | ||
sigma = 1 / np.sqrt(data[depthkey][indx]) / 5 # nanomaggies, 1-sigma | ||
targets[fluxkey][:] = truth[fluxkey] + rand.normal(scale=sigma) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cf my previous comment. There was no need to change existing, functional code to a different style.
I think we should close this stale branch. @moustakas: You reviewed this work extensively, so I'm giving you a chance to holler before I close the branch. |
Fine with closing. |
Opening this as requested. I can verify it works for my use case, but it may not otherwise be ready for primetime.
This populates mock targets with MASKBITS and GAL/PSF depths early in the processing. This is motivated primarily by mock targets inheriting the desi imaging mask. But, when does this way, naturally allows for inhomogeneity in the mock target catalogue.
To your points John, I've removed the pandas dependency - just convenient for testing.
On the broader maintainability on a laptop. There is a look up of legacy coadds, e.g.:
/global/project/projectdirs/cosmo/data/legacysurvey/dr8/south/coadd/018/0184m117/legacysurvey-{brick}m117-{type}-{filt}.fits
where type is currently required for all of 'nexp', 'depth', 'galdepth', 'psfsize' & filt is grz. Image is not required as we're excluding the local sky brightness lookup in this.
If a local copy of a release, e.g. /global/project/projectdirs/cosmo/data/legacysurvey/dr8/ for the required bricks was present in principle this could be supported. I'm not sure how practical that is.
I had in mind a --maskbits flag that, if not provided, defaults to the previous "simple" imaging (homogenous) model with no lookup, otherwise assume you're on nersc and proceed. This would then only exclude laptop debugging of this particular feature.
If you want to go down the route of local copies, I think I should be able to wire the config such that if you provide a e.g. LEGACYSURVEYDIR that is your local clone of a release, then proceed with the maskbit etc. lookup.
There are a few other changes in there that I've found useful for debugging the 1% work, but some are probably not essential. Comments welcome.